Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal speech recognition
# Multimodal speech recognition
Gemma 3 4b It Speech
Gemma-3-MM is a multimodal instruction model extended from Gemma-3-4b-it with added speech processing capabilities, capable of handling text, image, and audio inputs to generate text outputs.
Audio-to-Text
Transformers
G
junnei
383
12
Featured Recommended AI Models
Empowering the Future, Your AI Solution Knowledge Base
English
简体中文
繁體中文
にほんご
© 2025
AIbase